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(57) A hypertext document and anchor sentences of 

parent documents for the hypertext document are regis- 
tered with an hypertext document identifier as docu- 
ment information for each of hypertext documents 
having reference relationships with each other A user 
can refer to one hypertext document according to an 
anchor sentence of another hypertext document func- 
tioning as a parent document: Aiso. occurrence posi- 
tions of one word in hwaertext documents and parent 
documents are registered as word information for each 
of words. When a keyword is input, a plurality of partic- 
ular hy^rtext documents and particular parent docu- 
ments in which the k^ond appears are specified 
according to the word information, one particular hyper- 
text document and corresponding particular parent doc- 
uments are unified to a unified hypertext document for 
each particular hj^ertext document, an occurrence fre- 
quency of the keyword in each unified hypertext docu- 
ment is c^cuiaf^ according to the document 
information, importance degrees of the unified hwcertext 
documents are calculated as those of the particular 
h^certext documents according to the occurrerffie fre- 
quencies, and ranking of the particular hyperfext docu- 
ments are determined according to those importance 
degrees. Because the occurrence frequency is calcu- 
lated by considering the parent documents, the particu- 
lar hypertext documents can be appropriately ranked. 



FIG. 3 
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Description 

RAnKQROUND OF THE INVENTION 
1 .FIELD OF THE INVENTION: 5 

The present invention relates generally to a hyper- 
text document retrieving apparatus, and more particu- 
larly to a hypertext document retrieving apparatus m 
which a plur^ity of hypertext documents likely to meet a i 
user's reWeval request are retrieved from a large vol- 
ume of hypertext documents and are presented to ttre 
user. 

2.DESCRIPTION OF THE RELATED ART: 

2.1. PREVIOUSLY PROPOSED ART: 

As a conventional apparatus in which one or more 
documents likely to meet a user’s retrieval request are , 
retrieved from a large volume of documents and are 
presented to the user, a document retrieving ^paratus 
200 shown in Fig. 1 is known. In this apparatus 200, a 
large volume of documents stored in a document man- 
aging unit 201 are analyzed in advance in a retrieval 
index developing unit 202, and it is examined how many 
times each of a plurality of words registered in a diction- 
ary of the retrieval index developing unit 202 appears in 
each of the documents. That is. an occurrence fre- 
quency of each word in one document is calculated for 
each of the documents stored in the document manag- 
ing unit 201 , a deviation degree IDF of one word in the 
total documents is calculated as a correction factor for 
the word for each of the words, a normalized occurrence 
frequency (called a TF value) of each word is calculated 
for each of the documaifs, an estimated value of eadi 
document ejqiressed by TF'IDF is calculated for each of 
the words by multiplying the deviation degree and the 
normalized occurrence frequency together, and a 
retrieval index is developed in the retrieval index devel- 
oping unit 202. in the retrieval index, a set of one word, 
identification data indicating one or more documents in 
which the word apfoears and one estimated value for the 
word is registered fcM- each of the words. 

Thereafter, when a plurality of keywords input by a 
user 207 are received in a keyword input unit 203, the 
keywords are transmitted to a retrieving urtit 204. In ttie 
retrieving unit 204, a plurality of retrieval words agreeing 
with the input keywords are found out from the retrieval 
index stored in the retrieval index developing unit 202. a 
particulffl- set of one retrieval word, identification data 
indicating one or more retrieval documents in which the 
retrieval word appears and one estimated value for the 
retrieval word is taken out for each of the retrieval words 
from the retrieval index developing unit 202, and tiie 
particular sets corresponding to the keywords are trans- 
mitted to a document ranWng determining urtit 205. 

In the document ranking determining unit 205, a 
plurality of identification titles indicating the retrieval 



documents are arranged in decreasing order of the esti- 
mated values of the retrieval documents to determine 
the raiking of tile retrieval documents, and the identifi- 
cation titles arranged according to the ranking of the 
retrieval documents are displayed as a retrieval result in 
a retrieval result displaying unit 206. Thereafter, when 
tile user selects the identification titles displayed on the 
displaying unit 206 one after another in the arranged 
order, the retrieval document indicated by the selected 
0 identification title is read out from the document manag- 
ing unit 201 to the displaying unit 206 each time one 
idaitif ication title is selected, and the retrieval document 
is displayed on the retrieval result di^laying unit 206 
eadi time one identification title is selected, 
rs Therefore, because the keywords according to a 
user's retrieval request are input by the user, a plurality 
of documents likely to meet the user's retrieval request 
can be presented in the order of the estimated value 
TF'IDF. 

so A plurality of calculation methods of the estimated 
value TF'IDF are known. As an example of one calcula- 
tion method, the deviation degree IDF (= l- log Nw/N) 
obtained by subtracting a logarithmic value (tog Nw/N) 
of the ratio from 1 is defined. Here, the symbol Nw 
2S denotes the number of documents in which a remarked 
word aji^ears, and the symbol N denotes the number of 
documents stored in the document managing unit 201. 
Also, the normalized occurrence frequency 
TF («Fo/Nwd) obtained by dividing an occurrence fre- 
30 quency Fo of tiie raoarked word in a remarked docu- 
ment by the number Nwd of words appearing in the 
remarked document is defined. In this case, the esti- 
mated value TF'IDF is calculated by multiplying the 
deviation degree and the normalized occurrence fre- 
35 quency together. 

The detail of the estimated value TF’IDF and a con- 
ventional document retrieving apparatus In which ttie 
estimated value TF'IDF is used are disclosed in a liter- 
atiwe "Salton, Gerard: introduction to modern Informa- 
40 tion Retrieval. Mi^raw-Hill computer science series, 
19B3). 

2.2. PROBLEMS TO BE SOLVED BY THE INVEN- 
TION: 

45 

However, in cases where one or more particular 
hypertext dtKumente likely to meet a user’s retrieval 
request are retrieved from a large volume of hypertext 
documents by using the conventional document retriev- 
50 ing apparatus, because the hypertext documents are 
not generally independent from each other but the 
hypwtext documents often have reference relationships 
with each other, there is a drawback that the ranking of 
the particiflar hypertext documents likely fo meet tine 
SB user's retrieval request cannot be appropriately deter- 
mined- That is. because contents of a plurality of partic- 
ular hypertext documents having a referential 
relatiwiship with each other are often connected with a 
consistent meaning, the contents of the particular 
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h)^erte3tt documents cannot be understood by reading 
only one particular hypertext document but be under- 
stood by reading all of the particular hypertext docu- 
ments. Therefore, in cases where one or more 
particular hypertext documents likely to meet a user's 
retrieval request are retrieved by using the coirventionai 
document retrieving apparatus, an importance de^-ee 
of each particular hypertext documerrt is erroneously 
estimated, so that there is a drawback that the ranking 
of the particular hypertext documents cannot be appro- 
priately determined. Also, even ' though the particular 
hypertext, documents ranked according to their esti- 
mated values are displayed, because the ranking of the 
particular hypertext documents is not appropriately 
determined, there is another drawback that the user 
cannot smoothly select the particular hypertext docu- 
ments in an appropriate irrportance degree order. ■ 

In particular, because a possibility that a plurality of 
hypertext documents written in a hypertext mark-up lan- 
guage (HTML) in a world wide web have a referential 
relationship with each other is considerably high, the 
ranking of the particular hypertext documents cann« be 
appropriately determined, and the user cannot 
smoothly select each of the particular hypertext docu- 
ments even though the particular hypertext documeirts 
ranked according to their estimated values are dis- 
played, ; 



SUMMARY OF THE IHVFMTinK| 



An object of the present invention is to provide> with 
due consideration to the drawbacks of such a conven- 
tional document retrieving apparatus, a hypertext docu- 
ment retrieving apparatus in which one or more 
hypertext documents likely to meet a user's retrieval 
request are retrieved from a large volume of hypertext 
documents and are aRjropriately ranked according to 
their importance degrees to smoothiy select each of the 
hypertext documents even though the hypertext docu- 
ments are written in the hypertext mark-up language in 
the world wide web. , : 

- To achieve the object of the present invention in a 
hypertext document retrieving apparatus, a plurality of 
particular hypertext documents likely to meet a user's 
retrieval request are retrieved from a group of hypertext 
documents having . reference, relationships with, each 
other in which one hypertext' document having an 
anchor sentence functions as a parent document lor 
another hypertext document functioning as a reference 
document and a user refers to one reference document 
after the user selects one anchor sentence of one-par- 
ent document, corresponding to the reference docu- 
ment. 

In detail, in hypertext document table preparing 
means, hypertext document information, in which one 
hypertext document identifier identifying one hypertext 
document, a body of the hypertext document, a parent 
document identif ieridentifying a parent documsTt corre- 
sponding to the hypertext document functioning as one 



reference document and an anchor sentence of the par- 
ent document are registered, is prepared for each of the 
hypertext documents, and a hypertext document table 
of the hypertext document information for all hypertext 
5 documents is prepared in advance. 

Thereafter, in retrieval index preparing means, a 
plurality of words appearing in each of the hypertext 
documents and the parent documents are recognized 
according to the hypertext document table prepared by 
10 the hypertext document table preparing means, a plural- 
ity of occurrence posifons of the words in each of the 
hypertext documents and the parent documents are 
recognized according to the hypertext document table 
word information, composed of one or more occurrence 
IS document identifiers identifying one or more hypertext 
documents in which one word appears and occurrence 
positions of tee word in the hypertext documents and 
one or more anchor sentences of one or more parent 
documents corresponding to the hypertext documents, 
20 IS prepared for each of the words, and a retrieval index 
of pieces of word information for the words is prepared 
in advance. 

Thereafter, when a keyword indicating the user's 
retrieval request is received in keyword receiving 
25 means, particular word information corresponding to the 
keyword: is retrieved, in. retrieving means from the 
retrieval index prepared by the retrieval index preparing 
means. /Uso, a plurality of particular occurrence docu- 
ment identifiers identifying a plurality of particular hyper- 
30 te)d documents in which the keyword appears and a 
plurality of particular occurrence positions of the key- 
word in the particular hypertext documents and one or 
more particular anchor sentences of one or more partic- 
ular parerit documents corresponding to the oarficular 
35 hypertext documents are retrieved from the particular 
word information. 

Thereafter,., in document ranking determining 
means, the particular hypertext documents identified by 
tee particular occurrence document identifiers are spec- 
40 ified, pieces of particular .hypertext document informa- 
tion for the particular hypertext documents are retrieved 
from the hypertext document table prepared by the 
hypertext document table preparing means, one partic- 
ular hypertext document and one or more particular par- 
45 ent.. documents corresponding to. the particular 
tWertexf document are unified- to a unified hypertext 
document for each of the particular hypertext docu- 
ments, an OGCumence frequency of the keyword in one 
unified hypertext'doGument is calculated fot each uni- 
50 fled hypertext document, a plurality of importance 
degrees of the unified hypertext documents are deter- 
mined according to the occurrence frequencies in the 
unified hypertext documents, one importance degree of 
one unified hypertext document is set as an importance 
55 degree of one pwticular hypertext document corre- 
sponding to the unifief hypertext document for each 
unified hypertext document, and tee ranking of the par- 
ticular hypertext documents is determined according to 
the importance degrees of the unified hypertext docu- 
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merits. 

Thereafter, a plurality of indexes of the particular 
hypertext documents are displayed by retrieval result 
displaying means in a ranked order corresponding to 
the ranking of the particular hypertext documente as a 5 
retrieval result. 

Because one unified hypertext document is pre- 
pared by unifying one particular hypertext document 
and one or more particular parent documents corre- 
sponding to the particular hypertext document for each io 
of the particular hypertext documents and one impor- 
tance degree of one unified hypertext document is cal- 
culated as one importance degree of one particular 
hypertext document corresponding to the unified hyper- 
text document for each of the unified hypertext docu- is 
ments, the ranidng of ttie particular hypertext 
documents can be determined by considering the ar- 
ticular parent documents having the reference relation- 
ships with the particular hypertext documents. 
Therefore, even though contents of a plurality of pacific so 
hypertext documents having a referential relationship 
with each other are connected with a consistent mean- 
ing, the specific hypertext documents likely to meet the 
user’s retrieval request can be correctly retrieved from a 
large volume of hypertext documents and be ^propri- ss 
ately ranked according to tiieir importance degrees, so 
that the user can smoothly select the specific hypertext 
documents in an appropriate importance degree order 
even though the specific hypertext documents are w^- 
ten in tire hypertext mark-up language in the wwld wide so 
web. 



The objects, features and advantages of toe 35 
present invention will be apparent from the following 
description taken in oonj unction with the accompanying 
drawings, in which; 

Fig. 1 is a block diagram of a conventional docu- « 
ment retrieving apparatus; 

Fig. 2 shows a reference relationship among a plu- 
rality of hypertext documents distributively man- 
aged in a world wide web of an internet; 

Fig. 3 is a block diagram of a hypertext retrieving 4S 
apparatus according to a first embodiment of the 
present inventiori; 

Fig. 4 shows a hypertext document table of ftieces 
of hypertext document information prepared in a 
hypertext doaiment table with parent document list so 
preparing unit shown in Fig. 3; 

Fig. 5 shows a retrieval index of pieces of word 
inforrration prepared in a retrieval index preparing 
unit shown in Fig. 3; 

Fig. 6 is a block diagram of a hypertext retrieving ss 
apparatus according to a second embodiment of 
the present invention; 

Fig. 7 shows an example of a retrieval resitit in 
which an index of one particular hypertext docu- 



metrt is displayed with an index of a first-stage par- 
ticular parent document and an index of a second- 
stage particular parent document tor each particu- 
lar hypertext document by a retrieval result display- 
ing unit shown in Fig. 6; 

Rg. 8 is a block diagram of a hypertext retrieving 
qp pqrahiK according to a third embodiment of the 
present invention; 

Rg. 9 shows an example of a retrieval result in 
which indexes of a plurality of particular hypertext 
documents are displayed with an index of a first- 
stage particular parent document arxi an index of a 
second-stage particular parent document by a 
retrieval result displaying unit shown in Fig. 8; 

Rg. 10 is a block diagram of a hypertext retrieving 
apparatus according to a fourth embodiment of the 
present invention; 

Rg. 11 is a block diagram of a hypertext retrieving 
ap paratiLs according to a fifth embodiment of the 
present Invention; 

Rg. 12 shows an example of a retrieval result in 
which an index of one particular hypertext docu- 
ment is displayed with a summary of the particular 
h^ertext document, an index of a first-stage partic- 
ular parent document and an index of a second- 
stage particular parent document for each particu- 
lar hypertext document by a retrieval result display- 
ing unit shown in Fig, 11 ; 

Rg. 13 is a Wock diagram of a hypertext retrieving 
apparatus according to a sixth embodiment of the 
present invention; 

Fig. 14 is a block diagram of a hypertext retrieving 
apparatus according to a seventh embodiment of 
toe present invention; 

Rg. 15 is a block diagram of a hypertext retrieving 
apparat us according to an eighth embodiment of 
the present invention; 

Rg. 16 is a block diagram of a hypertext retrieving 
apparatus according to a ninth embodiment of the 
present invention; 

Rg. 1 7 shows toe division of a long hypertext docu- 
ment wto one or more reference labels; 

Rg. 18 is a block diagram of a hypertext retrieving 
apparatus according to a tenth embodiment of toe 
present invention; 

Fig. 19 ^rows an example of a retrie/at result, in 
which indexes of hypertext documents and buttons 
corresponding to a plurality of high-ranking related 
worcte are ctisplayed, according to the tenth etribod- 
iment 

Rg. 20 is a block diagram of a hypertext retrieving 
apparatus according to an eleventh embodiment of 
toe present invention; and 
Rg. 21 shows an example of a retrieval result, in 
which indexes of hypertext documents and buttons 
corresponding to a plurality of high-ranking related 
words are displayed, according to the eleventh 
embodiment. 
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DETAILED DESriRIPTIO N OF THE EMBOniMFMTc; 

Preferred embodiments of a hypertext document 
retrieving apparatus, in which one or more particutar 
hypertext documents likely to meet a user's retrieval s 
request are retrieved from a large volume of hypertext 
documents distributively managed in a world w^e web 
of an internet are described with reference to drawings 
according to the concept of the present invaition. 

Fig. 2 shows a reference relationship among a plu- lo 
ralily of hypertext documents distributively managed in 
a world wide web of an internet. 

As shown in Fig. 2. a plurality of hypertext docu- 
ments D80 to D86 distributively managed in a world 
wide web of an internet have a referential relationship is 
with each other. That is, an anchor sentence S800 is 
placed in the hypertext document D80, an anchor sen- 
tence SS01 is placed in the hypertext document D81. an 
anchor sentence S802 is placed in the hypertext docu- 
ment D82, a plurality of anchor sentences S803 to S805 so 
are placed in the hypertext document 083, and an 
anchor sentence S806 is placed in the hypertext docu- 
ment D84. In each of the anchor sentences, either an 
identifier identifying a document to which a user can 
make reference or a position of a document to which a ss 
user can make reference is buried. 

A document to which a user makes reference is 
called a reference document in this specification, and a 
document having one anchor sentence which indicates 
one or more reference documents is called an parent so 
document in this specification. Also, each anchor sen- 
tence is composed ol one sentence or a plurality of sen- 
tences. 

Therefore, when a user reads the parent document 
DS1 displayed on a display of a browsed document ss 
selecting means {called a browser) and points out a 
position of the anchor sentence S801 of the parent doc- 
ument D81 by using a so-called pointing device, the ref- 
erence document D83 is called and displayed, so that 
the user can efficiently use the distributed hypertext 40 
documents D80 to D86. 

A group of the hypertext documents D80 to D86 is 
written in a hypertext mark-up language, and each 
hypertext document is called a page, and a character 
string, an Image or a program is written in each hyper- 4s 
t0(t document. For example, in cases where the parent 
document D81 is stored in a fiie named "farmer.html", 
the reference document 083 is stored in a file named 
'■apple.html" and an indicator (or a document storing 
position) indicating a reference to the reference docu- so 
menf D83 is buried in a character siring "apple produc- 
ing fermer" written in the parent document D81 to frame 
toe anchor sentence 3801, toe anchor sentence S801 
is e^^ressed by "< a href="apple.htmr) apple produc- 
ing farmer (/a > In this case, because any sentence is ss 
not written in toe reference document D83. there is a 
case that the document D82 is prepared in a computer 
placed far from another computer, in which the docu- 
ment D83 prepared before toe preparation of the docu- 



ment D81 is stored, and the document DS2 functions as 
an parent document for the reference document D83. 

(First Errtiodiment) 

Fig. 3 is a block diagram of a hypertext retrieving 
apparatus according to a first embodiment of the 
present invention. 

As shown in Fig. 3, a hypertext retrieving apparatus 
1 for retrieving one or more hypertext documents likely 
to meet a user’s retrieval request from a large volume of 
hypertext documents stored in a hipertext document 
managing unit 8 in which the hypertext documents pre- 
pared in a large number of computers widely distributed 
in a network of a world wide web are distributively man- 
aged on condition that the hypertext documents have 
reference relationships with each other, comprises 

a hypertext document tabie with parent document 
list preparing unit 7 for analyzing the hypertext doc- 
uments having the reference relationships which 
are managed by the hypertext document managing 
unit 8. preparing hypertext document information in 
whicto one or more parent document identifiers 
identifying one or more parent documents and 
anchor sentences of the parent documents are 
li^ed with one hypertext document identifier identi- 
fying one hypertext document and a document stor- 
ing position of the hypertext document, for each of 
toe hypertext documents, and preparing a hyper- 
text document table of the hypertext document 
information for ail hypertext documents managed 
by the hypertext document managing unit 8, a 
retrieval index preparing unit 6 having a dictionary 
for analyzing a body of one hypertext document, a 
title of the hypertext document and character 
strings erf one or more anchor sentences of one or 
more parent documents corresponding to the 
hypertext document in advance for each of the 
hypertext documents managed by toe hypertext 
document managing unit 8 according to the hyper- 
text document table prepared by the hypertext doc- 
ument table with parent document list preparing 
unit 7 to recognize a plurality of words appearing in 
the hypertext documents, preparing a piece of wotd 
information for one word in which one occurrence 
document identifier identifying one hypertext docu- 
ment. in which the word registered in the dictionary 
appears, and positional information indicating 
occurrence positions of the word in the title of the 
hypertext document, the body of the hypertext doc- 
ument and the anchor sentences of the parent doc- 
uments corresponding to the hypertext document 
are listed for each of the hypertext documents, and 
preparing a retrieval index of pieces of word infor- 
mation for the words stored in the dictionary, 
a keyword input unit 2 for receiving a plurality of 
keywords input by a user 9, 
a retrieving unit 3 for retrieving a plurality of pieces 
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of particular word information corresponding to a 
plurality of particular words agreeing with the key- 
words received in the keyword input unit 2 Irom the 
retrieval index prepared in the retrieval index pre- 
paring unit 6 and retrieving particular occurrence s 
document identifiers identifying particular hypertext 
documents, in which one particular word agreeing 
with one keyword appears, and particular positional 
infa-mation indicating particular occurrence posi- 
tions of one particular word in the particular hyper- u 
text documents and a plurality of particular parent 
documents corresponding to the particular hyper- 
text documents from the particular word information 
for each of the particular words, 
a document ranking determining unit 4 for unifying u 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document t^le pre- 2 
pared by the hypertext document table with parent 
document list preparing unit 7 lor each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3. calculating an occurrence frequency TF of 
one particular word in one unified particular hyper- 2 
text document for each particular word and each 
unified particular hypertext document, calculating 
an inverse document frequency IDF defined as an 
inverse value ol the number of particular hypertext 
documents, in which one particular word appears. ^ 
for each particular word, calculating a product • 
TF'IDF of one occurrence frequency TF and one 
inverse document frequency IDF, summing a plural- 
ity of products for all particular words to produce a 
summed product as an estimated value for ^ch ; 
unified particular hypertext document, determining 
a plurality of importance degrees of tiie unified par- 
ticular hypertext documents according to the esti- 
mated values, determining the ranking of the 
particular hypertext documents according to toe 
importance degrees for the unified particular hyper- 
text documents and preparing an index of one par- 
ticular hypertext document for each of the particular 
hypertext documents, and 
a retrieval result displaying unit 5 lor disptaiting the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 4 as a retrieval result. 

In the above configuration, an operation of the 
hypertext retrieving apparatos 1 is described. A plurality 
of hypertext documents having reference retattonships 
with each other are prepared in a large number of com- 
puters widely distributed in a network of a world wide 
web. In the h^ertext document managing unit 8, the 
hypertext documents are distributively managed, The 
reference document table with parent document prepar- 
ing unit 7 has a related document collecting function 
(generally called a web robot) . Therefore, when a plural- 



ity ol document storing position addresses (generally 
called a plurality of universal resource locators) of a plu- 
rality of hypertext documents are given to the reference 
document table with parent document preparing unit 7, 
the plurality of hypertext documents are indicated as a 
plurality of parent documents by the universal resource 
locator one after another, one or more anchor sen- 
tences written in each of the parent documents are ana- 
lyzed. and one or more reference documents are 
collected for eadi of toe parent documents. Thereafter, 
a plurality of h^oertext document identifiers not over- 
lapped with each other are allocated to the collected ref- 
erence documents in the order of collection to identify 
toe collected reference documents. In this case, when 
any image or program is not written in each of toe col- 
lected reference documents and a character string is 
written in each of toe collected reference documents, a 
collecting time can be saved. Also, a plurality of docu- 
ment storing position addresses of toe collected refer- 
ence documents are listed to prohibit that one collected 
reference document listed is again collected. Therefore, 
as shown in Fig. 2, though not only toe parent document 
D83 relates to toe reference document D84 according 
to toe anchor sentence S803 but also the parent docu- 
ment D84 relates to the reference document D83 
according to the anchor sentence S806. it is prohibited 
that the hypertext documents D83 and D84 are col- 
lected twice. 

Thereafter, a hypertext document table of pieces of 
30 hypertext document information (refer to Fig. 4) in which 
parent document WentHiers of one or more parent doc- 
umeitfs and anchor sentences of toe parent documents 
are listed for each hypertext document is prepared in 
the hypertext document table with parent document list 
35 preparing unit 7 according to a following procedure. A 
plurality of document information entry spaces DS1 to 
DS3 of which the number is equal to toe nurrtoer of col- 
lected reference documents are prepared. In each of 
toe document information entry spaces, the number of 
40 one hypertext document identifier idenfifiing one col- 
lected reference document and one document storing 
position address of the collected reference document 
are written in the document information entry space. 
Thereafter, a title of toe collected reference document is 
45 extracted from toe collected reference document by 
examining a plurality of character strings written in toe 
collected reference document. In this embodiment, a 
title "awJle that I grew" is, for example, extracted from a 
character string "ftitle > apple that I grew (title ) ”, and the 
so title is written in toe document information entry space. 
Thereafter, one or more character strings of hypertext 
mark-up language tags respectively denoting a charac- 
ter string placed between "r and ") " are removed from 
a plurality of character strings existing in a body of the 
ss collected reference document to form a text body, and 
toe text body is written in the document information 
entry space. Thereafter, it is checked whether or not 
one or more anchor sentences relating to one reference 
document exist in one or more parent documenis relat- 
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mg 10 the reference document. In cases where an 
anchor sentence exists in an parent document relating 
to one reference document, a set of an parent docu- 
ment identifier identifying the parent document and the 
anchor sentence of the parent document is written in 
the document information entry space to form an parent 
document list lor each hypertext document information. 
Also, a plurality of words used in the text body, the title 
and the anchor sentences are written in the document 
information entry space to form a word list for each 
hypertext document information. 

Therefore, in the reference document table with 
parent dooimenl preparing unit 7. as shown in Hg. 3. a 
document information entry space is prepared for each 
of the hypertext documents managed by the hypertext 
document managing unit 8, a hj^ertext document iden- 
tifier, a document storing position, a title, a text body, an 
parent document list and a word list are written in each 
of the document information entry spaces to prepare a 
hypertext document table. 

In this embodiment, the hypertext document table is 
prepared after one or more anchor sentences written in 
each of the parent documents are analyzed to collect 
the reference documente. Therefore, the anchor sen- 
tences are analyzed or checked twice to determine the 
collected reference documents and prepare the hyper- 
text document table. However, in cases where the 
hypertext document table is prepared while analyzing 
the anchor sentences to collect the reference docu- 
ments, the hypertext document table can be efficiently 
prepared. 

Thereafter, in the retrieval index pH'eparing unit 6 
having a dictionary, a body of a hypertext document, a 
title of the hypertext document and character strings of 
one or more anchor sentences of tiie hypertext docu- 
ment are analyzed in advance for each of the hypertext 
documents of the hypertext document table, a piece of 
word information composed of a word, one or more 
occurrence document identifiers identifying hypertext 
documents, in which the word appears, and positional 
information.indicating occurrence positions of tiie word 
in the hypertext documents is prepared for each of a 
plurality of words stored in the dictionary, and a retrieval 
index of pieces of word Information for the plurality of 
words is prepared as shown in f=ig. 5. 

In detail, tens of thousands words are registered in 
the dictionary of the retrieval index preparing units, and 
a plurality of word information entry spaces WS1 to 
WS3, of which the number is equal .to the numb« of 
words registered in the dictionary, are prepared, and 
each of the words is written in one of the word infcxma- 
tion entry spaces WS1 to WS3. Thereafter, a wcsd reg- 
istered in the word list of one document information 
entry space of the hypertext document table is detected 
as a particular word, a hypertext document identifier of 
a particular hypertext document corresponding to the 
document information entry space is detected as an 
occurrence hypertext document identifier, one or more 
positions of the particular word in the particular hj^er- 



text document are detected as positional information, 
and a set of tiie occurrence hypertext document identi- 
fier and the positional information is written as word 
information in a particular word information entry space 
s corresponding to the particular word. This processing is 
performed for each of the words registered in tiie word 
lists of all document information entry spaces of the 
hypertext document table, so that a retrieval index of the 
pieces of word information corresponding to a plurality 
10 of words used in the hypertext documents is prepared. 

Fig. 5 shows a piece of word information of the 
retrieval index which is written in the word information 
entry space WS1 and corresponds to a word "apple". 
"rnfle.1)" indicates that the word "apple" appears in the 
15 first word position of the title of the hypertext document 
D83, "(8ody,4, 33.43)" indicates that the word "apple” 
^ears in the fourth. 33-th and 43-th word positions of 
tiie body of the hypertext document D83, "(000081,1)" 
indicates that the word "apple" appears in the first word 
so positicHi of the anchor sentence S801 of the hypertext 
document D81 functioning as the parent document, and 
"(000082,4)" indicates that tiie word "apple” appears in 
tiie fourth word position of the anchor sentence S802 of 
the hypertext document D82 functioning as the parent 
25 document. 

Also, it is applicable that an inveise value of the 
number of occurrence documents in which a word 
appears (generally called an inverse document fre- 
quency IDF) and the occurrence frequency of the word 
30 in each of the occurrence documents (generally called a 
text frequency TF) be calculated in advance in the 
retrieval index preparing unit 6 and written in a corre- 
sponding word information entry space for each of the 
vrords. Therefore, a processing time required for the 
35 retrieval can be shortened. 

Therefore, in the retrieval index preparing unit 6, 
each of the words appearing in the text body of the 
hypertext document, the title of the hypertext document 
and the anchor sentences of the parent documente 
<*0 relating to the hypertext document is analyzed, and an 
occurrence document list composed of one or more 
occurrence document identifiers and the positional 
information is prepared for each word. Accordingly, a 
retrieval iixlex in which word appearing positions in 
45 each of the hypertext documente are indicated for each 
word can be prepared. 

The keyword input unit 2 has a function of a text box 
and a retrieval starting button for returning contents of 
the text box, and an HTML document written according 
50 to the hypertext mark-up language having a title such as 
retrieval page" is employed for the keyword input unit 2. 
TTiat is, the user 9 calls the HTML document in the world 
wide web browser such as Mosaic or Netscape oper- 
ated in his own client computer, a single keyword is 
55 input to the text box or a plurality of keywords divided by 
spaces are input to the text box, and the retrieval start- 
ing button is pushed. Therefore, the single keyword or 
keywords are input. 

Therefore, a plurality of keywords input by the user 
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9 are received in the keyword input unit 2 and are trans- 
mitted to the retrieving unit 3. in this embodiment, the 
user inputs each of the keywords by pushing a plurality 
of keys arranged on a keyboard. However, in cases 
where each of a plurality of candidates for a keyword is f 
selected by pushing a button, a keyword input operation 
using the pointing device can be easily performed with- 
out using any keyboard even thtxigh an unskilled per- 
son operates the keyword input unit 2. 

In the retrieving unit 3. pieces of particular word 
information corresponding to a plurality of particular 
words, which agree with the keywords input by the key- 
word input unit 2. are extracted from the retrieval index 
stored in the retrieval index preparing unit 6, and one or 
more occurrence document identifiers identifying one or 
more particular hypertext documents, in which one par- 
ticular word agreeing with one keyword appears, and 
positional information indicating positions of «ie particu- 
lar word in the particular hypertext documents are 
obtained from one piece of word information for each of 
the particular words. A plurality of sets of the occurrence 
documerri identifiers and the positional information are 
transmitted to tiie document ranking determining urtit 4. 

In the document ranking determining unit 4. pieces 
of hypertext document information corresponding to the 
particular hypertext documents identified by the occur- 
rence document identifiers are extracted from the 
hypertext document table, and one particular hypertext 
document and one or more parent documents identified 
by one or more parent document identifiers listed in one 
piece of hypertext document information corresponding 
to the particular hypertext document are unified to an 
unified particular hypertext document. The unified par- 
ticular hypertext document is formed for eadi of the par- 
ticular hypertext documents which are identified by the 
occurrence documertt identifiers transmitted from the 
retrieving unit 3. Thereafter, an inverse document fre- 
quency IDF defined as an inverse value of the number 
of unified particular hypertext documents in whidi one 
particular word agreeing with one keyword appears and 
the occurrence frequency TF of one particular word in 
each of the unified particular hypertext documents are 
calculated for each of the particular words according to 
the plurality of sets of the occurrence document identifi- 
ers and the positional information. The inverse docu- 
ment frequency IDF denotes a correction factor for each 
particular word. 

Thereafter, in cases where one keyword is only 
input, an estimated value obtained by multi!:rfying the 
inverse document frequency IDF for one particular word 
and the occurrence frequency TF together is calculated 
as an importance degree for each of the unified particu- 
lar hypertext documents. Also, in cases where the 
number of keywords input by the user is two cw more, a 
product TF*IDF of one occurrence frequency TF and 
one inverse document frequency IDF is calculated for 
each keyword and each unified particular hypertext doc- 
ument, a sum of the products calculated for all keywords 
is adopted as an estimated value for each of toe unified 



particular hypertext documents, and an importance 
degree for each of the unified particular hypertext docu- 
ments is determined according to the estimated values. 
The importance degree for each unified particular 
hypertext document is set as an importance degree for 
one partiailar hypertext document corresponding to toe 
ursTied particular hypertext document. Thereafter, the 
ranking of the particular hypertext documents including 
toe parent documents is determined according to toe 
,0 inportance degrees of toe particular hypertext docu- 
ments. 

In cases where the number of keywords is two or 
more, it is applicable that an estimated value for one 
particul^ hyp«1ext document be set to a value N times 
16 (N is two or more) as high as a sum of the products 
TF*IDF calculated far all keywords when N particular 
words agreeing wHh N keywords appear in the particu- 
lar hypertext docmnent. In tois case, because the corre- 
lation among toe N keywords is reflected on toe 
20 importance degree for each particular hypertext docu- 
ment, toe us^'s retrieval request can be moreover sat- 
isfied. 

Also, in cases where two particular words agreeing 
with two keywords are used in one particular hypertert 
25 document close to each other within 20 characters, it is 
applicable that an estimated value for the unified partic- 
ular hypertext document be doubled. In this case, 
because toe correlation between the two keywords 
close to each other is reflected on toe importance 
30 degree for each particular hypertext document, the 
user’s retrieval request can be moreover satisfied. 

Thereafter, in the document ranking determining 
unit 4. an HTML document, in which a plurality of 
indexes of the particular hypertext documents are listed 
35 in toe ranked order, is prepared and transmitted to toe 
retrieval result displaying unit 5. in this case, the index of 
one particular hypertext document is a titie of toe partic- 
ular hypertesd document or a character string of an 
anchor sentence written in one of the parent docu- 
40 mente. and a document storing position address indicat- 
ing a position of the particular hypertext document in toe 
hypertext document managing unit 8 is buried in toe 
index of toe particular hypertext dooiment, and toe 
index functions as an anchor sentence. That is, when 
4B toe user selects one index of one particular hypertext 
document, toe particular hypertext document is called 
from the hypertext document managing unit 8 according 
to the document storing position address. 

Therefore, in toe document ranking determining 
so urtit 4, one or more parent documents having a refer- 
ence relationship with one particular hypertext docu- 
mant are extracted from the hypertext document table 
prepared in the reference document table with parent 
document preparing unit 7 for each particular hypertext 
55, f documerrt. one particular hypertext document and one 
T j or more parent documents having a reference relation- 
ship wito the particular hypertext document are unified 
to a unified particular hypertext document for each par- 
ticuto hypertext document, an importance degree of 
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the particular hypertext document including the p^ert 
documents is determined according to an estimated 
value TF1DFN for each particular hypertext document, 
the particular hypertext documents are ranked acconJ- 
ing to the those importance degrees, and the particular s 
hypertext documents are listed in the ranked order. 

In this emtxidiment, the occurrence frequency TF of 
the word is not normalized because the occurrence fre- 
quency TF is not divided by a size of one unified partic- 
ular hypertext document. However, in cases where the n 
occurrence frequency TF of the word is normalized by 
dividing the occurrence frequency TF by a size of one 
unified particular hypertext document, it is required that 
a size of each hypertext document is written in the 
hypertext document table. j5 

The retrieval result displaying unit 5 is endbodied by 
the world wide web browser such as Mosaic or Net- 
scape operated in his own client computer. The HTML 
document prepared in the document ranking determm- 
ing unit 4 is displayed on a display of the client compu- 20 
ter. TTiereafter, when the user selects one index of one 
particular hypertext document tabi ed in the HTML docu- 
ment by using a pointing device, a position of the partic- 
ular hypertext document selected by the user is 
ascertained according to the document storing position 25 
address burled in the index of the particular hypertext 
document, and the particular hypertext document is 
called from the hypertext document managing unit 8. 

Therefore, in the retrieval result displaying unit 5, 
the indexes of the particular hypertext documents listed so 
in the HTML document are displayed, the user selects 
one index of one particular hypertext document, and the 
particular hypertext document selected , by the user is 
called from the hypertext document managing unit 8. 

Accordingly, because one or more parent docu- ss 
ments having a reference relationship with each refer- 
ence document are listed in the hypertect document 
table prepared by the reference document table with 
parent document preparing unit 7, the parent docu- 
ments corresponding to one reference document can w 
be specified by extracting the document information cor- 
responding to the reference document from the hyper- 
text document table. Therefore, because it. is not 
required to ask the hypertext document managing unit 8 
for one or more parent documents corresponding to the 4s 
reference document, one or more parent documents 
corre^onding to each refo'ence document can be 
quickly ascertained. 

Also, because one particular hypertext document 
and one or more parent documents having a reference so 
relationship with the particular hypertext document are 
unified as an unified particular hypertext document in 
the document ranking determining unit 4. an importance 
degree can be determined for each of the unified partic- 
ular hypertext documents. Therefore, foe ranking of the 55 
particular hypertext documents in which one particutar 
word agreeing with one keyword appears can be deter- 
mined according to the importance degrees while con- 
sidering the parent documents corresponding to each 



particular hypertext document. Accordingly, the indexes 
of the particular hypertext documents can be displayed 
by foe retrieval result displaying unit 5 according to foe 
ranWng of the particular hypertext documents on condi- 
tion foat foe user's retrieval request expressed by foe 
keyword is reiiaWy satisfied, and the user can selects 
foe particular hypertext documents in foe ranked order. 

Also, because one hypertext document and one or 
more anchor sentences of one or more parent docu- 
ments having a reference relationship with the hypertext 
document are listed in each piece of document informa- 
tion of foe hypertext document table prepared by the ref- 
erence document table with parent document preparing 
unit 7, each piece of word information of foe retrieval 
index indicalfog that a word appears in one hypertext 
document and one or more anchor sentences of one or 
more parent documents having a reference relationship 
with foe hypertext document can be easily prepared in 
the retrieval index preparing unit 6. In addition, because 
one or more parent documents having a reference rela- 
tionship with each reference document are listed in the 
typertext document table prepared by the reference 
document tatde with parent document preparing unit 7, 
when the retrieval index is prepared in the retrieval 
index preparing unit 6, it is not required to askthe hyper- 
text document managing unit 8 for one or more parent 
documents corresponding to foe reference document. 
Therefore, the retrieval index can be quickly prepared. 

(Second Embodiment) 

Fig. 6 is a block diagram of a hypertext retrieving 
apparatus according to a second embodiment of the 
present invention. 

As shown in Rg. 6, a hypertext retrieving apparatus 
1 1 for retrieving one or more hypertext documents likely 
to meet a user's retrieval request from a large volume of 
hypertext documents stored in the hypertext document 
manaspng unit 8, comprises the hypertext document 
fable with parent document list preparing unit 7, foe 
retrieval index prqoaring unit 6, the keyword input unit 2, 
foe retrieving unit 3, 

a document ranking determining unit 12 for unifying 
one particitiar hypertext document and one or more 
particular parent documents corresponding to foe 
particular hypertext document to a unified particular 
hypertext document according to the document 
infonmation of the hypertext document table pre- 
pared by the hypertext document table with parent 
document iist preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating estimated values for the unified 
particuiar hypertext documents according to the 
particul^ word information of the retrieval index 
obtained in the retrieval index preparing unit 6, 
determining a f^urality of importance degrees of the 
unified partiadar hypertext documents according to 
foe estimated values, determining the ranking of 
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the particular hypertext documents according to the 
importance degrees for the unif ied particular hyper- 
text documents and preparing an index of one par- 
ticular hypertext document with an index of a 
particular parent document corresponding to the 
particular hypertext document for each of the par- 
ticular hypertext documents, and 
a retrieval result displaying unit 13 for displaying the 
index of the particular hypertext document with the 
index of the particular parent document for each of 
the unified particular hypertext documents in the 
ranked order determined in the document ranldng 
determining unit 12 as a retrieval result. 

In the above configuration, after the ranking of the 
particular hypertext documents is determined according 
to the imporfance degrees in the document ranking 
determining unit 12 in the same manner as in the first 
errtoodiment. not only an index of one particula- hyper- 
text document but also an index of a particular parent 
document corresponding to the particular hypertext 
document are prepared for each of the particular hyper- 
text documents. In cases where a plurality of parent 
documents corresponding to the particular hypertext 
document exist, one parent document of which the doc- 
ument storing position is closest to that of the particular 
hypertext document among those of the parent docu- 
ments is selected as the particular parent document 
This selection is performed by comparing a portion of a 
character string indicating the document storing posi- 
tion of each parent document with a portion of a charac- 
ter string indicating toe document storing position of the 
particular hypertext document. Also, in this embodi- 
ment, the particular parent document (or a first-stage 
particular parent document) is regarded as a second- 
slage reference document, a second-stage particular 
parent document having a reference relationship with 
the second-stage reference document is specified, and 
an index of the second-stage particular parent docu- 
ment is prepared. Thereafter, toe index of one particular 
hypertext document is displayed with the index ot the 
first-stage particular parent document and the index of 
the second-stage particularjparent document for each 
particular hypertext document by the retrieval result dis- 
playing unit 13, 

Fig. 7 shows an example of the index of one partic- 
ular hj^ertext document displayediwith the index of the 
first-stage particular parent document and the index of 
the second-stage particular parent dotaiment for each 
particular hypertext document by the retrieval result dis- 
playing unit 13. 

As shown in Fig. 7, in cases where the fourth rank 
is given to toe hypertext document D83, the 18-th rank 
is given to the hypertext document D85 and the 19-th 
rank is given to the hypertext document D86, the index 
of the particular hypertext document D83 is displayed 
with the index of toe first-stage particular parwrt docu- 
ment D81 and toe index of the second-stage particular 
parent document P80 as a fourth ranking group, the 



index of the particular hypertext document D85 is dis- 
played with the index of the first-stage particular parent 
document D83 and the index of the second-stage par- 
ticular parent document D81 as a 18-th ranking group, 

5 and toe index of the particular hypertext document D86 
is displayed with the index of the first-stage particular 
parent document D83 and toe index of the second- 
stage particitiar parent document D81 as a 19-th rank- 
ing group. 

10 Accordingly, even though fhe hypertext document 
D86 having no anchor sentence is selected as one par- 
ticular hypertext document, the hypertext document 
D83 or D81 having a close relation with the hypertext 
document D86 can be easily selected and called from 
IS toe hypertext document managing unit 8 without relying 
on any anchor sentence. That is. because a plurality of 
hypertext documents having a reference relationship 
with each other closely relate to each other, the display 
of the indexes of the first-stage and second-stage par- 
20 ticular parent document is very useful for the user. 

(Third Embodiment) 

In the first or second embodimervt, in cases where 
25 toe h>?)ertext document D83 of the fourth rank is called 
and read, the hypertext document D85 is called and 
read by selecting the position ot toe anchor sentence 
8804 and a plurality of hypertext documents of lower 
ranks following toe fourth rank are called and read one 
so by one, there is a probability that the hypertext docu- 
ment D85 of the 18-th rank is erroneously called aid 
read again because the user forgets toe reading of the 
hypertext document D85 though the hypertext docu- 
ment D85 has been already read. Also, even though the 
35 hypertext document D86 of fhe 1 9-th rank is called and 
read, because a long time elapses after the hypertext 
document D83 of the fourth rank is called and read, 
there is a probability that toe user cannot understarto 
cont€9tt ot the hypertext document D86 closely relating 
40 to context of the hypertext document D83. Therefore, to 
solve fhe above drawbacks in toe third embodiment, toe 
ranks given to a plurality of hypertext documents closely 
relafing to each other are set to the same rank. 

Fig. 8 is a blodt diagram of a hypertext retrieving 
45 apparatus according to a third entoodiment of toe 
present invention. 

As shown in Fig'. 8, a hypertext retrieving apparatus 
21 for retrieving one or more hypertext documents, likely 
to meet a user's retrieval request from a large volume of 
so hypertext documents stored in the hypertext document 
managing unit 8. comprises toe hypertext document 
table with parent document list preparing unit 7, the 
retrieval index preparing unit 6, the keyword input unit 2, 
the retrieving unit 3, 

55 

a document ranking determining, unit 22 for unifying 
one particular hypertext document and one or more 
particular parent documents corresponding to toe 
particular hypertext document to a unified particular 
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hypertext document according to flie documait 
information of the hypertext document taWe pre- 
pared by the hypertext document table with parent 
document list preparing unit 7 for each of the parttc- 
uiar hypertext documents obtained in the retrieving 5 
unit 3, calculating estimated values for the unified 
particular hypertext documents according to the 
particular word information of the retrieval index 
obtained in the retrieval index preparing ura'l 6 . 
determining a plurality of importance degrees of the 10 
unified particular hypertext documents according to 
the estimated values, determining the ranking of 
the particular hypertext documents according to foe 
importance degrees for the unified particular hyper- 
text documents on condition that ranks given to two 
or more particular hypertext documents closely 
relating to each other are set to the same rank and 
preparing an index of one particular hypertext doc- 
ument for each of the particular hypertext docu- 
ments, and ^ 

a retrieval result di^laying unit 23 for displaying the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 22 as a retrieval result on condtion 
that two or more particular hypertext documents set 2s 
to the same rank are displayed with one or more 
particular parent documents corresponding to any 
;of the particular hypertext documents in common in 
a group. 

In the above configuration, after the importance 
degrees of the particular hypertext documents are cal- 
culated and the ranking of the particular hypertext doc- 
uments is determined according to the importance 
degrees in the document ranking determining unit 22 in 3S 
the same manner as in the first embodiment, one or 
more parent document Identifiers listed in one p»ece of 
document information of the hypertext document table 
corresponding to one particular hypertext document are 
extracted, and one or more parent documents idenfified 40 
by the parent document identif iers are specified for each 
particular hypertext document. Thereafter, it is judged 
whether or not each of the parent dooiments agrees 
with one of the particular hypertext documents. In cases 
where one parent document corresponding to a first 4 S 
particular hypertext document of a rank A agrees with a 
second particular hypertext document of a rank B, it is 
judged that the first and second particular hypertext 
documents closely relate to each other, and the fkst and 
second particular hypertext documents are reset to a ■ so 
higher rank between ttie ranks A and B. Thereafter, 
indexes of the particular hypertext documents are dis- 
played in the ranked order by the retrieval result display- 
ing unit 23. 

For example, because the parent document D83 55 
corresponding to the hypertext document D85 of the 18- 
th rank agrees with the hyperte)S document D® of flie 
fourth rank, the hypertext document f)85 is reset to the 
fourth rank Also, because the parent document D83 



corresponding to the hyperteixt document D86 of the 19 - 
th rank agrees with the hypertext document D83 of the 
fourth rank, the hypertext document D86 is reset to the 
fourth rank. 

therefore, because a plurality of particular hyper- 
text doojments closely relate to each other are set to 
ttie same rank and are displayed close to each other, 
the user can consecutively read the particular hypertext 
documents ctosely relate to each other, so that the user 
can easily realize the contexts of the particular hyper- 
text documents. Accordingly, it is prevented that the 
same particular hypertext document is erroneously read 
again, and the user can efficiently read a group of par- 
ticular hypertext documents closely relate to each other. 

In this embodiment, a plurality of particular hyper- 
text documents closely relate to each other are set to 
the highest rank among the ranks given to the plurality 
of parftculffl- hypertext documents. However, the third 
embodiment is not limited to this concept. That is, when 
a plurality of particular hypertext documents closely 
relate to each other are determined, it is applicable that 
a sum of the importance degrees of the particular 
hypertext documents be calculated and toe particular 
hypertext documents be reset to the same higher rank 
according to the summed importance degree. 

Ateo. it is preferred that the concept of the second 
embocfimenf and the concept of the ttiird embodiment 
be combined. For example, as shown in Fig. 7, when a 
frsl group of the particular hypertext document D83 and 
the parent documents D80 and D81 is set to the fourth 
raiftt, a second group of the particular hypertext docu- 
ment D85 and the parent documents D81 and D83 is 
set to the 18-th rank and a third group of the particular 
hypertext documertt D86 and the parent documents 
D81 and D83 is set to the 19-th rank according to the 
second embodiment, the second group of documents 
D81 , D83 and D85 set to the 18-lh rank is reset to the 
fourth rank, and the third group of documents D81 . D83 
and D86 set to the 1 9-th rank is reset to the fourth rank, 
and a combined group of the particular hypertext docu- 
ments D83, D85 and D86 and the parent documents 
D80 and D81 reset to the fourth rank is displayed as 
shown in Rg. 9. 

(Fourth Embodiment) 

.In general, a special word indicating a feature of a 
reference document appears many times in one or more 
anchor sentences of one or more parent documents 
cwresponding to ftte reference document. Therefore, in 
cases where an estimated value for the reference docu- 
ment is calculating by considering the special word 
appearing in toe anrtoor sentences of the parent docu- 
ment and the reference document Is ranked according 
to the estimated value, reliability for the retrieval of a plu- 
rality of hypertext documents likely to meet a user's 
retrieval request can be improved. 

Fig. 10 is a block diagram of a hypertext retrieving 
apparatus according to a fourth embodiment of the 
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present invention. 

As shown in Fig. 10, a hypertext retrieving ^ara- 
tus 31 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request frc»n a large vol- 
ume of hypertext documents stored in the hypertext s 
document managing unit 8. comprises the hypertext 
document table with parent document list preparing unit 
7. the retrieval index preparing unit 6, the keyword input 
unit 2, ttie retrieving unit 3, 

10 

a document ranking determining unit 32 for ec- 
lating an occurrence frequency of each particutar 
word in one particular hypertext document and one 
or more anchor sentences of one or more particular ■ 
parent documents corresponding to the particular is 
hypertext document as a revised occurrence fre- 
quency TF for the particular hypertext document for 
each of the particular hypertext documents aceexd- 
ing to the particular word Information of the retrieval 
index obtained in the retrieval index preparing unit so 
6, calculating estimated values of the particular 
hypertext documents according to the revised 
occurrence frequencies TF and inverse document 
frequencies IDF. determining a plurcdity of impor- 
tance degrees of the particular hypertext docu- ss 
ments according to the estimated values, 
determining the ranking of the particular hypertext 
documents according to the importance degrees 
and preparing indexes of the particular hypertext 
documents, and so 

a retrieval result displaying unit 33 for displaying the 
indexes of the particular hypertext documents in the 
ranked order determined In the document ranking 
determining unit 22 as a retrieval result. 

3S 

In the ^ove configuration, in cases where the user 
input a keyword "apple”, as shown in Fig, 4. the particu- 
lar word "apple" appears four times in the title of the 
hyperterd document D83 and the body of the hypertext 
document D83. Also, the particular word "apple" *o 
appears in the anchor sentence S80 1 of ttie parent doc- 
ument D81 and the anchor sentence S802 of the parent 
document D82. Theretore, because a sum of an occur- 
rence frequency of the particular word "apple" in the 
hypertext document D83 and the anchor sentences 4s 
S801 and S802 of the parent documents D81 and D82 
is 6. a revised occurrence frequency TF for the particu- 
lar hypertext document D83 is set to 6. and an esti- 
mated value of the particular hypertext document D83 is 
calculated by using the revised occurrence frequency so 
TF in the document ranWng determining unit 32. 
Accordingly, the particular hypertext dccument D83 is 
ranked to a higher rank, so that reliability of the retrieval 
of the particular hypertext document D83 can be 
improved. ss 

(Fifth Embodiment) 

In the first to fourth embodiments, in cases where 



the user desires to know an outline of contents of one 
particular hypertext document when an index of the par- 
ticular hypertext document is displayed, it is required to 
call the particular hypertext document from the hj^ar- 
text document managing unit 8, Therefore, In cases 
where the user desires to read contents of many partic- 
ular hypertext documents, it is troublesome that the user 
call the particular hypertext documents. 

Fig. 11 is a block diagram of a hypertext retrieving 
apparatus according to a fifth embodiment of the 
present invention. 

As shown in Rg. 11, a hypertext retrieving appara- 
tus 41 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises the hypertext 
documait table with parent document list preparing unit 
7, the retrieval index pr^aring unit 6. the keyword input 
unit 2, the retrieving unit 3, 

a document ranking determining unit 42 for unifying 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document t^le pre- 
pared by the hypertext documwit table with parent 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating estimated values for the unified 
particular hypertext documents for each particular 
word according to the particular word information of 
the retrieval index obtained in the retrieval index 
preparing unit 6, determining a plurality of impor- 
tance degrees ol the unified particular hypertext 
documents according to the estimated values for 
each particular word, determining the ranking of the 
partiailar hypertext documents according to the 
inportance degrees for the unified particular hyper- 
text documents for each particular word, preparing 
an index of one particular hypertext document for 
each of the particular hypertext documents and 
preparing a plurality of summaries of the particular 
hypertext documents for each of the particular 
words, and 

a retrieval result displaying unit 43 for displaying a 
groip of the indexes of the particular hypertext doc- 
iBTients with the summaries of the particular hyper- 
text documents in the ranked order determined in 
the document ranking determining unit 42 for each 
partiailar word as a retrieval result. 

In the above configuration, after the indexes of the 
particufar hypertext documents are pr^ared in the doc- 
umait ranking determining unit 42, a particular sen- 
tence or a particular phrase including one particular 
word is extracted from one particular hypertext docu- 
ment according to the positional information of the word 
information of the retrieval index prepared by the 
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retrieval index preparing unit 6. and a summary in which 
the parficuiar sentence or the particular phrase is writ- 
ten in succession to a top sentence or a top phrase of 
the particular hypertext document is prepared for each 
of the particular words and each of the particular hj^er- 5 
text documents. In cases where a plurality of particular 
sentences or a plurality of particular phrases includfog 
one particular word exist in one particular hypertext 
document, a summary in which the particular sentences 
or the particular phrases arranged in the existing order ,( 
are written in succession to a tqj sentence or a tt^ 
phrase of the particular hypertext documertt is pre- 
pared. Thereafter, the indexes of the particular hyper- 
text documents with the summaries of the particular 
hypertext documents are displayed for eadi particular « 
word by the retrieval result displaying unit 43 in the 
ranked order determined in the document ranking deter- 
mining unit 42. 

Accordingly, because the summary of one particu- 
lar h^jertext documerrt is displayed for each of the par- sc 
ticular hypertext documents, the user can realize an 
outline of contents of each particular hypertext docu- 
ment by reading the summary of each particular hyper- 
text document without calling each particular hypertext 
document from the hypertext document managing unit ss 
8, the user can easily select one or more particular 
hypertext documents meeting a user's retrieval request 

In this embodiment, even though a particular sen- 
tence or a particular phrase including one particular 
word appears many times in one particular hypertext so 
document, alt particular sentences or all particular 
phrases including the particular word are extracted from 
the particular hypertext document, and a summary is 
prepared. However, in cases where a summary of one 
particular hypertext document obtained by connecting a ss 
series of particular sentences or a series of particular 
phrases of the particular hypertext document with a top 
sentence or a top phrase of the particular h^ertext doc- 
ument becomes too long, it is difficult for the user to 
quickly realize a long summary. Therefore, it is ^lica- 40 
bte that three particular sentences or three particular 
phrases of the particular hypertext document be con- 
nected with a top sentence or a top phrase of the partic- 
ular hypertext document to prepare a summary for each “ 
particular word when the number of keywords iriput by 45 
the user is five or less, two particular sentences or.two 
particular phrases of the particular hypertext document 
be connected with a top senience or a tcp phrase of the .. 
particular hypertext document to prepare a summary for 
each particular word when the number of keywords so 
input by the user is ten or less, or one particular sen- - 
tence or one particular phrase of the particitiar hyper- 
text document be connected with a top sentence or a 
top phrase of the parficuiar hypertext document to pre- 
pare a summary for each particular wwd when the ss 
number of keywords input by the user is eleven or more. 
Therefore, it is prevented that the summary becomes 
too long, and the user can efficiently read a numbo- of 
summaries displayed by the retrieval result displaying 



unit 43. 

Also, it is preferred that the concept of the second 
embocf ment and the concept of the fifth embodiment be 
con^xned. For example, when a first group of the partic- 
ular hypertext document D83 and the parent documents 
D80 and D81 is set to the fourth rank, a second group of 
the particular hypertext document D85 and the parent 
dociments D81 and DS3 is set to the 1 8-th rank and a 
tiiird group of the particular hypertext document D86 
) and the parent documents D81 and D83 is set to the 1 9- 
fh rank according to the second embodiment, as shown 
in Fig. 1 2, a summary of the particular hypertext docu- 
• - menl 083 is added to the first group, a summary of the 
particular hypertext document D85 is added to the sec- 
1 ondigroup and a summary of the particular hypertext 
document D86 is added to the third group. 

(Sixth Embodiment) 

' In the world wide web, a composition (or an artici e) 
is divided into a number of portions, and each portion of 
the composition is written in one hypertext document. 
Therefore, there is a case that a context of the composi- 
tion is not sufficiently expressed in one portion of the 
composition written in one hypertext document. For 
^ample, though an apple grown in Aomori is described 
in the composition, the word "Aomori” indicating a pro- 
duction place of the apple is not written in the hypertext 
document D83 but is written in the parent document 
D81. 

Therefore, in cases where a plurality of keywords 
e>q}ressing a context of a composition are separately 
used in a hypertext document and a plurality of parent 
■ documents having a reference relationship with the 
hypertext document, the hypertext document is undesir- 
ably ranked to a lower class in the prior art. However, in 
the sixth embodimenf, one combined hypertext docu- 
ment produced by combining a retrieval hypertext docu- 
ment (or a particular hypertext document) and one 
parent document having a reference relationship with 
the retrieval hypertext document is prepared for each of 
the parent documents. importance degrees of the com- 
bined hypertext documents are compared with each 
other, one- combined -hypertext document having the 
rriaximum importance degree is selected, and the max- 
imum ipportanpe degree is used as an importance 
degree for the retrieval hypertext document 
*.Ftg. 13 is a block diagram of a hypertext retrieving 
apparatus according to a sixth embodiment of the 
present invention. 

As shown in Rg. 13, a hypertext retrieving appara- 
tus 51 for retriewng one or more hypertext documents 
likely to meet a user’s retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises the hypertext 
document table wito pa-ent document list preparing unit 
7, the retrieval index preparing unit 6, the keyword input 
unit 2. the retrieving unit 3. 
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a documant ranking determining unit 52 for ccHnbin- 
ing one particular hypertext document and one par- 
ticular parent document corresponding to the 
particuiar hypertext document to form a combined 
particular hypertext document according to the doc- s 
ument information of the hypertext document laWe 
prepared by the hypertext document table writh par- 
ent document list preparing unit 7 for each of the 
particular parent documents corresponding to the 
particular hypertext document and each of the par- lo 
ticular hypertext documents obtained in the retriev- 
ing unit 3. calculating estimated values for the 
combined particular hypertext documents accord- 
ing to the particular word information of the retrieval 
index obtained in the retrieval index preparing unit 6 w 
for each of the particular hypertext documents, 
determining a plurality of importance degrees of the 
combined particular hypertext documents accord- 
ing to the estimated values for each of the particular 
hypertext documents, comparing the importance 20 
degrees of the combined particular hypertext docu- 
ments with each other for each of the partiailar 
hypertext documents, selecting a maximum impor- 
tance degree among the importance degrees of the 
combined particular hypertext documents relating 
to one particular hypertext document for each of the 
particular hypertext documents, setting the maxi- 
mum importance degree to an importance degree 
for the part' cular hypertext document for each of the 
particular hypertext documents, determining the 30 
ranking of the particular , hypertext documents 
according to those importance degrees and prepar- 
ing an index of one particular hypertext document 
for each of the particular hypertext documents, and 
a retrieval result displaying unit 53 for displaying a 
group of the indexes of the particular hypertext doc- 
uments with the summaries of the particular hyper- 
text documents in the ranked order determined in 
the document ranking determining unit 52 for each 
particular word as a retrieval result. 

In the above configuration, when a keyword "apple" 
and another keyword "Aomori” are input by the user on 
condition that a word "apple" appears in the hj^ertext 
document D83 and a word "Aomori" indicating an apple- « 
producing prefecture does not appear in the hypertext 
document D83 or D82 but appear in the hypertext doc- 
ument D81 . because a particular word "apple" agreeing 
with the keyword "apple" appears in the hypertext docu- 
ment D83, the hypertext document D83 is set as a par- so 
ticular hypertext document in the retrieving unit 3. 

Thereafter, in the document ranking determining 
unit 52, the particular hyperterd document D83 and the 
particular parent document D81 are combined to form a 
first combined particular hypertext document, the partic- bs 
ular hypertext document D83 and the particular parent 
document D82 are combined to form a second com- 
bined particular hypertext document, estimated values 
for the combined particular hypertext documents are 



calculated for each of the particular words, a first sum of 
the esfimated vaiue of the first combined particular 
hypertext document for the particular words and a sec- 
ond sum of the estimated value of the second combined 
paiiicttiar hypertext document for the particular words 
are calculated. In this case, because the particular word 
"Aomori" does not appear in the hypertext document 
D82 but appear in the hypertext document D81 , the first 
sum of the estimated value of the first combined partic- 
ular hypertext document is higher than the second sum 
of the estimated value of the second combined particu- 
lar hypertext document. Therefore, the first corrbined 
particular hypertext document is selected, and the first 
sum of the esfimated value of toe first combined partic- 
ular hypertext dooment is set as an estimated value of 
toe particular hypertext document D83 for the keywords 
"apple" and "Aomori", and an importance degree for the 
particular hypertext document D83 is calculated from 
toe estimated value of the particular hypertext docu- 
ment D83. In the same manner, importance degrees for 
otoer particular hypertext documents are calculated, 
and toe ranhang of toe particular hypertext documents is 
determined according to the importance degrees. 

Accordingly, even toough a plurality of keywords 
25 expressing a context of a compoation are separately 
used in a hypertext document and a plurality of parent 
documents having a reference relationship vwth the 
hypertext document, because a combined particular 
hypertext document obtained by combining one particu- 
lar hypertext document and one particular parent docu- 
m«it is formed for each of the particular parent 
documents and a maximum estimated value of one 
combined particular hypertext document among those 
of the combined particular hypertext documents is set 
35 as an estimated value for toe particular hypertext docu- 
ment, there is no probability that toe particular hypertext 
document is undesirably ranked to a lower class. 

(Seventh Embodiment) 

A heading portion of a hypertext document nor- 
mally indicates a feature of the hypertext document very 
well. Therefore, to heavily estimate a particular word 
at^earing in the heading portion of toe hypertext docu- 
ment, an occurrence frequency of the particular word 
agreeing with one keyword in toe heading portion of toe 
hypertext document is doubled. As an exanrple of the 
heading portion, a title of the hypertext document or an 
anchor sentence of a parent document having a refer- 
ence relationship with the hypertext document is coc- 
kered in this embodiment. 

Rg. 14 is a block diagram of a hypertext retrieving 
af^aratus according to a severrth embodiment of the 
fNTesent invention. 

As shown in Fig. 14, a hypertext retrieving appara- 
fers 61 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in toe hypertext 
document managing unit 8. comprises toe hypertext 
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document table with parent document list preparing unit 
7, the retrieval index preparing unit 6, the keyword input 
unit 2, the retrieving unit 3, 

a document ranking determining unit 62 for unifjring s 
one particuiar hypertext document and one or more 
particular parent documents corre^nding to me 
particular hypertext document to a unified particular 
hypertext document according to me document 
information of the hypertext document table pre- u 
pared by the hj^ertext document table with parent 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating an occurrence frequency TF of 
one particular word in one unified particular hyper- is 
text document for each particular word and each 
unified particuiar hypertext document on condition 
that an occurrence frequency of the particular word 
appearing in an heading portion of the unified par- 
ticular hypertext document is doubled, calculating 20 
an inverse document frequency IDF defined as an 
inverse \alue of the number of particular hypertext 
documents, in which one particular word appears, 
for each particular word, calculating a product 
TF*IDF of one occurrence frequency TF and one 2S 
inverse document frequency IDF. summing a plural- 
ity of products for all particular words to produce a 
summed product as an estimated value for each 
particular hypertext document, determining a plu- 
rality of importance degrees of the unified particular ao 
hypertext documents according to the estimated 
values, determining the ranking of the particular 
hypertext documents according to the importance 
degrees for the unified particular hypertext docu- 
ments and preparing an index of one partiorlar as 
hypertext document for each of the particular 
hypertext documents, and 
a retrieval result displaying unit 63 for displaying the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 4o 
determining unit 62 as a retrieval result. 

In the above configuration, a heading portion of 
each unified particular hypertext document is com- 
posed of a title of one particular hypertext document 4S 
corresponding to the unified particular hypertext docu- 
ment and one or more anchor sentences of particular 
parent documents having a reference relationshp with 
the particular hypertext document. For example, in 
cases where a particular word agreeing with one key- so 
word appears six times in one unified particular hyper- 
text document on condition that the particular wcH-d 
appears three times in the heading portion of the unified 
particular hypertext document, the particular word 
appearing in the heading portion of the unified particuiar ss 
hypertext document is 

douWe-counted each time the particular word appears, 
so that an occurrence frequency TF of the particular 
word in the unified particular hypertext document is 



equal to 9. Thereafter, one particuiar hypertext docu- 
ment corre^nding to the unified particular hypertext 
document is ranked according to the occurrence fre- 
quency TF=9. 

Accorcfingiy, because the heading portion of the 
hypertext document normally indicates a feature of the 
hypertext document very well and the particular word 
appearing in the heading portion of the unified particuiar 
hypertext document is 

double-counted, reliability for the ranking of the particu- 
lar hypertext documents can be moreover heightened. 

In an HTML hypertext document written by the 
hypertext mark-up language, a small index is expressed 
by a diaracler string surrounded by "<h1 ) " and " (/hi 
Therefore, it is applicable that the small index be 
included in the heading portion of the HTML hypertext 
document 

In this errtoodiment. the occurrence frequency of 
the particular word appearing in the heading portion of 
the unified particular hypertext document is doubted. 
However, if is applicable that the occurrence frequency 
of the particuiar word be increased three or more times. 

(Eighth Embodiment) 

In tie hypertext documents of the world wide web, 
there is a fecial hypertext document in which a 
number of anchor sentences exist and any other sen- 
tences do not exist. This special hypertext document is 
genei^ly called a link page. Even though the link page 
is refrieved and displayed, any useful information meet- 
ing a user's retrieval intention does not exist in the link 
page. Therefore, an occurrence number of a particular 
word in the link page is towered to 2ero in this embodi- 
ment 

Fig. 15 is a block diagram of a hypertext retrieving 
apparatus according to an eighth embodiment of the 
present invention. 

As shown in Fig. 15, a hypertext retrieving appara- 
tus 71 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises the h^ertext 
document fable with parent document list preparing unit 
7, the retrieval index preparing unit 6, the keyword input 
unit 2. the retrieving unit 3, 

a document ranking determining unit 72 for unifying 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document table pre- 
pared by the hypertext document table with parent 
document list preparing unit 7 for each of the partic- 
ular hypsrtext documents obtained in the retrieving 
unit 3. spedfying a link page from among the partic- 
ular hypertext documents, calculating an occur- 
rence frequency TF of one particular word in one 
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unified particular hypertext document for each par- 
ticular word and each unified particular h^ja’texl 
document on condition that an occurrence fre- 
quency of the particular word in the link page is 
reduced by one each time the particular word is s 
found out in the link page treated as one particular 
parent document of Bie unified particular hypertext 
document, calculating an inverse document fre- 
quency IDF defined as an inverse value of the 
number of particular hypertext documents, in which io 
one particular word appears, for each particular 
word, calculating a product TF'IDF of one occur- 
rence frequency TF and one inverse documettt fre- 
quency IDF, summing a plurality of products for all 
particular words to produce a summed product as /s 
an estimated value for each particular hypertect 
document, determining a plurality of importance 
degrees of the unified particular hypertext docu- 
ments according to the estimated values, determin- 
ing frie ranking of the particular hypertext go 
documents according to the importance degrees for 
Hie unified particular hypertext documents and pre- 
paring an index of one particular hypertext docu- 
ment tor each of the particular hypertext 
documents, and . as 

a retrieval result displaying unit 73 for cfisplaying the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 62 as a retrieval result. 

In the above configuration, the hypertext document 
D82 is. for example, a link page relating to the particular 
word "apple" and is composed of fen anchor sentences. 
Therefore, ten reference documents respectively having 
a reference relationship with the hypertext document 35 
DS2 exist When an occurrence frequency of the partic- 
ular word "apple" in a unified particular hypertext docu- 
ment conposed of one reference document treated as 
one particulffl- hypertext document and the hypertext 
document D82 treated as one particular parent docu- 40 
ment is calculated, an occurrence frequency of foe par- 
ticular word "apple" in the hypertext dooiment D82 
treated as one particular hypertext document is reduced 
by one each time the particular word "apple" is found 
out in the particular parent document D82. This reduc- 4s 
ing operation is performed for all reference documents 
treated as the particular hypertext documents. 

Therefore, even though the particular word "apple”' 
spears in the hypertext document D82 many times, the ■ 
occurrence frequency of foe particular word "apple" in so 
the hypertext document D82 is necessarily reduced to 
zero, and the hypertext document D82 is rartiied to foe 
lowest class. 

Accordingly, any particular hypertext document 
functioning as one link page can be always ranked to ss 
the lowest class. 



(Ninth Embodiment) 

There is a long hypertext document composed of a 
plurality of blocks respectively corresponding to a 
meaning, and a reference label is arranged in the top of 
each block of the long hypertext document. In this 
embodiment, the long hypertext document is divided 
into foe plurality of blocks, and a hypertext document 
table corresponding to each block of the long hypertext 
document is prepared. 

Rg. 16 is a block diagram of a hypertext retrieving 
apparatus according to a ninth embodiment of the 
present fovention. 

As shown in Fig. 16, a hypertext retrieving appara- 
tus 76 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in foe hypertext 
document managing unit 8. comprises 

a hypertext document table with parent document 
list preparing unit 77 for analyzing foe hypertext 
documents having the reference relationships 
which are managed by foe hypertext document 
managing unit 8. specifying a long hypertext docu- 
ment composed of a plurality of blocks respectively 
corresponding to a meaning, setting each block of 
the long hypertext document as one hypertext doc- 
ument corresponding to one meaning, preparing 
hypalext document information in which one or 
more parent document identifiers identifying one or 
more parent documents and anchor sentences of 
the parent ctocuments are listed with one hypertext 
document identifier identifying one hypertext docu- 
ment and a document storing position of the hyper- 
text document, for each of foe hypertext 
documents, and preparing a hypertext document 
table of the hypertext document information for all 
hypertext documents managed by the hypertext 
document managing unit 8, 
the retrieval index preparing unit 6, the keyword 
input unit 2. the reti"ieving unit 3, foe document 
ranking determining unit 4 and the retrieval result 
displaying unit 73. 

In the above configuration, as shown in Fig. 17, in 
cases where a long hypertext document D87 composed 
of a plurality of blocks respectively corresponding to a 
meaning exists in foe hypertext documents managed by 
the hypertext document managing unit 8, the long 
hypertext document D87 Is specified by foe hypertext 
document toble with parent document list preparing unit 
77, and one or more reference labels respectively 
pranged on foe top of one block of the long hypertext 
document D87 are found out. Thereafter, the long 
hypertext document D87 is divided into the plurality of 
Mocks, and each block of the long hypertext document 
D87 is set as one hypertext document D87, D88 or DS9, 
In this case, when foe user reads a character string 
"ABC" or "XfZr of an anchor sentence of one hypertext 
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document, the user can immediately refer to the refer- 
ence label such as "#ABC" or "#XYZ' of another hyper- 
text document. Thereafter, a hypertext document table 
of the hypertext document information for all hypertext 
documente Is prepared in the same manner as in the s 
first embodiment. 

Accordingly, even though a tong hypertext docu- 
ment composed of a plurality of blodts respectively cor- 
responding to a meaning exists in the hypertext 
documents, because the long hypertext document is io 
divided into the blocks and each block of ihe tong hyper- 
text document is set as one hypertext document to pre- 
pare the hycertext document information for each btock 
of the long hypertext document, the hypertext docu- 
ments respectively relating to one meaning can be is 
ranked, so that the user can easily retrieve a group of 
hypertext documents likely to meet his request. 

In this embodiment, in cases where a sm^t index 
expressed by a character string surrounded by "(hi >" 
and "(/hi ) " Is used in a long hypertext document, it is so 
applicable that the long hypertext document be divided 
I rrto a plurality of blocks on condition that one reference 
label or one small index is arranged on the top of each 
block. 

SB 

(Tenth Embodiment) 

In cases where the user intends to again retrieve a 
plurality of hypertext documents by changing an initial 
keyword to another keyword which relates to a plurality so 
of particular hypertext documents displayed according 
to the Initial keyword, the user generally desires to 
acimowledge one or more words frequently appearing 
in the particular hypertext documents. Therefore, in this 
embodiment, one or more words frequently appearing ss 
in the particular hypertext documents are displayed. 

Fig. 18 is a block diagram of a hypertext retrieving 
apparatus according to a tenth embodiment of the 
present invention. 

As shown in Fig. 18, a hypertext retrieving appara- 40 
tus 91 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertect 
document managing unit 8, comprises 

45 

the hypertext document table with parent document • 
list preparing unit 7, the retrieval index preparing 
unit 6, the keyword input unit 2, the retrieving unit 3. 
a document ranking determining unit 92 for unifying 
one particular hypertext document and one or more so 
particular parent documents corresponding io the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document table pre- 
pared by the hypertext document table with parent ss 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating an occurrence frequency TF of 
one particular word in one unified particular hyper- 



text document for each particular word and each 
unified particular hypertext document, calculating 
an inverse document frequency IDF defined as an 
inverse value of the number of particular hypertext 
documente, in which one particular word appears, 
for eadi particular word, calculating a product 
TF*IDF of one occurrence frequency TF and one 
inverse document frequency IDF, summing a plural- 
ity of products for all particular words to produce a 
aunwned product as an estimated value for each 
particular hypertext document, determining a plu- 
rality of importance degrees of the unified particular 
hypertext documents according to the estimated 
values, determining the ranking of the particular 
hypertext documents according to toe importance 
degrees for the unified particular hypertext docu- 
ments, preparing an index of one particular hyper- 
text document for each of the particular hypertext 
documents, selecting a plurality of high-ranking 
hypertext documents from the particular hypertext 
documents, extracting a plurality of related words 
listed in a plurality of word lists of pieces of hyper- 
text document information of toe hypertext docu- 
ment table corresponding to the high-ranking 
hypertext documents, calculating an occurrence 
frequency TF of one related word in one high-rank- 
ing hypertext document for each related word and 
each high-ranking hypertext document, calculating 
an inverse document frequency iDF defined as an 
inverse value of the number of high-ranking hyper- 
text documents, in which one related word appears, 
for each related word, calculating a sum of a plural- 
ity of products TF1DF for all high-ranking hypertext 
documents to produce a summed product as an 
importance degree for each related word, compar- 
ing the importance degrees of toe related words 
with each other, selecting a plurality of high-ranking 
related words of which toe importance degrees are 
higher than those of other related words, and pre- 
paring a hypertext mark-up language (HTML) docu- 
ment in which a plurality of keyword selection 
buttons corresponding to the high-ranking related 
words are arranged in toe decreasing order of the 
importance degrees of the high-ranking related 
words to select one high-ranking related word by 
pushing one keyword selection button, and 
a retrieval result displaying unit 93 for displaying the 
indexes of toe particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 92 as a retrieval result on a result 
displaying window W1 and displaying toe HTML 
document pr^ared by the document ranking deter- 
nxning unit 92 on a high-ranking related word 
selecting window W2, 

in the above configuration, in cases where the tenth 
embodiment and toe toird embodiment are combined, 
as shown in Fig. 19, when a keyword "apple" is input to 
toe keyword input unit 2, a plurality of indexes of partic- 
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ular hypertext documents such as documents D83. D85 
and D86 and a plurality of indexes of parent documents 
such as documents D80 and D81 are, for example, dis- 
played on the result displaying window W1 in the same 
manner as in the third embodiment. Thereafter, in the s 
document ranking determirang unit 92. ten high-ranking 
hypertext documents are selected from the particular 
hypertext documents, a plurality of related words listed 
in a plurality of word lists of pieces of hypertext docu- 
ment information of the hypertext document table corre- w 
spending to the high-ranking hypertext documents are 
extracted, a sum of a plurality of products TF*IDF tor all 
high-ranking hypertext documents is calculated for each 
related word, and importance degrees for the related 
words are determined. Thereafter, ten high-ranking 
related words "Shinshu", "farmer”, "product". "Aomori", 
"manure", "farm", "festival", "Nebula", "Nagano" and 
"Olympics” are selected from the related words, an 
HTML document in which ten keyword selection buttons 
corresponding to the high-ranking related words are 2o 
arranged in the decreasing order ot the importance 
degrees of the high-ranking related words is prepared, 
and the HTML documerrt is displayed on the high-rank- 
ing related word selecting window W2, 

Therefore, when the user push the keyword button 
corresponding to the high-ranking related word "Shin- 
shu", the word "Shinshu" indicating an appie-producing 
district is input to the keyword input unit 2 as a keyword, 
importance degrees of a plurality of particular hypertect 
documents comesponding to the keyword "Shinshu" are so 
determined, and the particular hypertext documents 
arranged in the decreasing order of the importance 
degrees are displayed on the result displaying window 
W1 in the same manner as in the first embodiment. 

Accordingly, even though the user cannot initially ss 
bring an appropriate keyword to his mind, the user can 
select one or more keywords closer to his retrieval 
intention. Also, the user can change his retrieval inten- 
tion by referring to the high-ranking related words, and a 
plurality of particular hypertext documents corre^ond- 40 
ing to a new keyword selected by the user according to 
his new retrieval intention can be displayed. 

In this case, the user can push the keyword selec- 
tion button by using a pointing device without usfog a 
keyboard. Also, the keyword selection buttons are *s 
embodied by operating a JAVA script in which the high- 
ranldng related words are added to a text box, a "clear" 
button is entoodied by operating a JAVA script in which 
one high-ranking related word added to the text box is 
cleared, an "initiai condition" button is embodied by so 
operating a JAVA scrip! in which the high-raiWng 
related words added to the text box are returned to an 
initial group of keywords such as "apple", and an "re- 
retrieval" button is embodied by operating a JAVA scr^ 
in which a retrieval c^ieration is again operated by uang ss 
one or more words added to the text box as one or more 
keywords. 

in this embodiment, the high-ranking hypertext doc- 
uments are selected from the particular hypertext docu- 



ments, However, it is applicaWe that the high-ranking 
hypertext documents be selected from the particular 
hypertext documents and the parent documents. In this 
case, a pluraMy of related words can be widely collected 
from a plurality of hypertext documents having a refer- 
ence relationship with each other. 

(Eleventh Embodiment) 

In the tenth embodiment, the importance degrees 
of the related words are determined without any con- 
nection wifli the keyword initially input by the user. How- 
ever, in cases vfoere the user desires to select related 
word having a close correlation with the keyword, it is 
IS preferred that a related word having a dose correlation 
with a keyword be preferentially selected as a high- 
ranking related word. Therefore, in this embodiment, an 
occurrence frequency of a related word having a close 
correlation with a keyword is doubled to heighten an 
infoortance degree of the related word. 

Rg. 20 is a Wock diagram of a hypertext retrieving 
apparatus according to an eleventh embodiment of the 
present invention. 

As shown in Fig. 20, a hypertext retrieving appara- 
tus 101 for relriewng one or imore hypertext documents 
likely to meet a user’s retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises 

the hypertext document fade with parent document 
list preparing unit 7, toe retrieval index preparing 
unite, toe keyword input unit 2, the retrieving unit 3, ■ 
a documefTt ranking determining unit 1 02 for unify- 
ing one particuiar hypertext document and one or 
nwe particular parent documents corresponding to 
the particular hypertext document to a unified par- 
ticular hypertext document according to toe docu- 
ment information of the hypertext document table 
prepared by toe hypertext document table with par- 
ent documerrt list preparing unit 7 for each of toe 
particuiar hypertext documents obtained in toe 
retrieving unit 3, calculating an occurrence fre- 
quency TF of one particular word in one unified par- 
ticular hypertext document for each particular word 
and each unified particular hypertext document, 
calculating an inverse documerrt frequency IDF 
defined as an inverse value of the number of partic- 
ular hypertext documents, in which one particular 
word appears; for each particular word, caicuiating 
a product TF*IDF of one occurrence frequency TF 
and one inverse document frequency IDF, summing 
a frturality of products for ail particular words to pro- 
duce a summed product as an estimated value for 
each particular hypertext document, determining a 
plurality of inportance degrees of toe unified parfic- 
ular hypertext documents according to the esti- 
mated values, determining toe ranMng of the 
particidar hypertext documents according to toe 
importance degrees for the unified particular hyper- 
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text documents, preparing an index of one particu- 
lar hypertext document for each of the particular 
hypertext documents, selecting a plurality of high- 
ranking hypertext documents from the particular 
hypertext documents, extracting a plur^ity of 5 
related words listed in a plurality of word lists of 
pieces of hypertext document information erf the 
hypertext document table corresponding to •tfie 
high-ranking hypertext documents, caltajiating an 
occurrence frequency TF of one related word in one w 
high-ranking hypertext document for eacrfi related 
word and each high-ranking hypertext document on 
condition that the related word is double-counted 
when the related word is placed within a distance of 
40 letters from one keyword, calculating an inverse 15 
document frequency IDF defined as an inverse 
value of the number o 1 high-ranking hypertext doc- 
uments, in which one related word appears, for 
each related word, calculating a sum of a plurality of 
products TF*IDF for all high-ranking hypertext doc- 20 
uments to produce a summed product as an impor- 
tance degree for each related word, oomparing the 
importance degrees of the related words with each 
other, selecting a plurality of high-ranking r^ated 
words of whidi the importance degrees are higher 25 
than those of other related words, and preparing a 
hypertext mark-up language (HTML) document in 
which a plurality of keyword selection buttons corre- 
sponding to the high-ranking related vrords are 
arranged in the decreasing order of the importance 30 
degrees of the high-ranking related words to select 
one high-ranking related word by pushing one key- 
word selection button, and 
.a retrieval result displaying unit 103 for displaying 
the indexes of the particular hypertext documents in ss 
the ranked order determined in the document rank- 
ing determining unit 92 as a retrieval result on a 
result displaying window W1 and displaying the 
HTML document prepared by the document rank- 
ing determining unit 102 on a high-ranking related 40 
word selecting window W2, 



In the above configuration, after the related words 
are extracted ia the same manner as in the tenth 
ertrodiment, an occurrence frequency TF of one 4s 
related word in one high-ranking hypertext document is 
calculated for each related word and each high-ranking 
hypertext document. In this case, when the related word 
is placed within a distance of 40 letters from one key- 
word "apple", the related word is doifole-counted. so 
Therefore, because the related word "Shinshu" indicat- 
ing an apple-producing district or the related wwd 
"farmer" often appears within a distance of 40 letters 
from one keyword "apple” and because foe related word 
"Nagano" indicating an apple-producing prefecture or 55 
the related word "Olympics" indicating a festival held in 
the Nagano in 1998 is hardly appears within a distance 
of 40 letters from one keyword "apple", as shown in Fig. 

21, the related words "Shinshu" and "farmer" are relia- 



bly displayed on the head portion of the high-ranking 
related word selecting window W 2 . and the related 
words "Nagano” and "Olympics” are displayed on foe 
rear portion of the high-ranking related word selecting 
window W2 even though the related words "Nagano" 
and "Olynpics” frequently appear in the particular 
hypertext documents. 

Accordingly, one or more related words having a 
strong relationship with the keyword can be displayed in 
htgh^•anking positions, and one or more related words 
corresponding to a user’s retrieval intentfon differing 
from the initial retrieval intention can be displayed in 
low-ranking poations. 

Having illustrated and described the principles of 
the present invention in a preferred embodiment 
thereof, it should be readily apparent to those skilled in 
foe art that the invention can be modified in arrange- 
ment and detail without departing from such principles. 
We claim all modifications coming within foe scope of 
the accompanying claims. 

Claims 

1 . A_hypertext document retrieving apparatus for 
retrieving a plurality of particular hypertext docu- 
ments likely to meet a user’s retrieval request from 
a group of hypertext documents having reference 
relationships with each other in which one hypertext 
document having an anchor sentence functions as 
a parent document for another hypertext document 
functioning as a reference document and a user 
refers to one reference document after the user 
selects one anchor sentence of one parent docu- 
ment corresponding to the reference document, 
conprising: 

hypertext document table preparing means for 
preparing hypertext document information, in 
which one hypertext document identifier identi- 
fying one hjpertext document, a body of foe 
hypertext document, a parent document identi- 
fier identifying a parent document correspond- 
ing to the hypertext document functioning as 
one reference document and an anchor sen- 
tence of the parent document are registered, 
for each of the hypertext documents and pre- 
paring a hypertext document table of the hyper- 
text document information for the hypertext 
documents: 

retrieval index preparing means for recognizing 
a plurality of words appearing in each of the 
hypertext documents and the parent docu- 
ments according to the hypertext document 
table prepared by the hypertext document table 
preparing means, recognizing a plurality of 
occurrence positions of the words in each of 
the hypertext documents and the parent docu- 
ments according to the hypertext document 
fable, preparing word information, composed of 
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one or more occurrence document identifiers 
identifying one or more hypertext documents in 
which one word appears and occurrence posi- 2. 
lions of the word in the hypertext documents 
and one or more anchor sentences of one or 5 
more parent documents corresponding to the 
hypertext documents, for each of the words, 
and pr^aring a retrieval index of pieces of 
word information for the words: 
keyword receiving means for receiving a key- 10 
word indicating the user's retrieval request; 
retrieving means for retrieving particular word 
informatiwi corresponding to the keywcMd 
received by the keyword receiving means from 
the retrieval index prepared by the retrieval is 
index preparing means and retrieving a plural- 
ity of particular occurrence document identifi- 
ers identifying a plurality of particular hypertext 
documents in which the keyword appears and 
a plurality of particular occurrence positions of so 
the keyword in the particular hypertext docu- 
ments and one or more particular anchor sen- 
tences of one or more particular parent 
documents corresponding to the particular 
hypertext documents from the particular word ss 
information; 

document ranking determining means for spec- 
ifying the particular hypertext documents which 
are identified by the particular occurrence doc- 
ument identifiers retrieved by the retrieving so 
means, retrieving pieces of particular hypertext 
document information for the particular hyper- 
text documents from the hypertext document 
table prepared by the hypertext document table 
preparing' means, unifying one particular ss 
h^ertext document and one or more particular 
parent documents corresponding to the partic- 
ular hypertext document to a unified hypertext 
document for each of the particular hypen«d 
documents, calculating an occurrence fre- 40 
quency of the keyword in one unified hypertext 
document for each unified hypertext document 
determining a plurality of importance degrees 
of the unified hypertext documents acccwding 
to the occurrence frequencies in the unified 45 
hypertext documents, setting one importance 
degree of one unified hypertext docimient as 
an importance degree of one particuiar hyper- 
text document corresponding to the urtified 
hypertext document for each unified hypertext so 
document and determining the ranWng of the 
particular hypertext documents accorcSng to 
the importance degrees of the particular hyper- 
text documents; and 

retrieval result displaying means for displaying ss 
a plurality of indexes of the particuiar hypertect 
documents in a ranked order corresponding to 
the ranking of the particular hypertext docu- 
ments determined by the document ranking 



determining means as a retrieval result. 

A hypertext document retrieving apparatus accord- 
ing to claim 1 in which an index of one particular 
parent document corresponding to one particular 
hypertext document is displayed with the index of 
the particular hypertext document by the retrieval 
result displaying means for each of the particular 
hypertext documents. 

3. A hypertext document retri eving apparatus accord- 
ing to claim 1 in which a plurality of particular hyper- 
text documents corresponding to the same 
particular parent document are reset to the same 
rank as a highest rank among the ranks determined 
for the particular hypertext documents by the docu- 
ment ranking determining means, and the particu- 
lar hypertext documents set to the same rank are 
displayed with the particular parent document in a 
group by the retrieval result displaying means. 

4. A hypertext document retrieving apparatus accord- 
ing to daim 1 in which a plurality of particular hyper- 
text documents corresponding to the same 
particular parent document are reset to a same 
rank according to a sum of the importance degrees 
for the particular hypertext documents by the docu- 
ment ranking determining means, and the particu- 
lar hypertext documents set to the same rank are 
displayed witii the particular parent document in a 
group by the retrieval result displaying means. 

5. A hypertext document retri eving apparatus accord- 
ing to claim 1 in which each of the unified hypertext 
documents is formed by the document ranking 
determining means by unifying one or more anchor 
sentaices of one or more particular parent docu- 
ments corresponding to one particular hypertext 
dooiment and the particular hypertext document. 

6. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a particular sentence or a 
particular phrase including the keyword is extracted 
from eadi of the particular hypertext documents by 
the document ranking determining means, and a 
summary in which one particular sentence or one 
particiJar phrase of one particuiar hypertext docu- 
ment is written in succession to a top sentence or a 
top phrase of the particular hypertext document is 
displayed wth the index of the particular hypertext 
document for each of the particular hypertext docu- 
ments. 

7. A hypertext document retrieving apparatus accord- 
ing to clam 1 in which tiie importance degree of 
each of the unified hypertext documents Is deter- 
mined by the document ranking determining means 
fay calculating a sum of an occurrence frequency of 
the keyword in one hypertext document and an 
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40 



occurrence frequency of the keyword in one parent 
document corresponding to the hypertert docu- 
ment for each of the parent documents correspond- 
ing to the hypertext document, selecting a 
maximum sum among the sums for ttie parent doc- s 
uments, specifying one particular parart document 
corresponding to the maximum sum. determining 
one importance degree for a combination of the 
hypertext document and the particular parent docu- 
ment according to th e maximum sum and regarding -,o 
the importance degree as one importance degree 
of one unified h^ertext document corresponding to 
the hypertext document. 

8. A hypertext document retrieving apparatus accord- is 
ing to claim 1 in which the occurrence frequency of 
the keyword in each unified hypertext document is 
calculated by the document ranWng determining 
means by double-counting the keyword appearing 

in one or more anchor sentences of one or more so 
particular parent documents corre^nding to the 
unified hypertext document. 

9. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which the occurrence frequency of ss 
the keyword in one hypertext document functioning 

as a link page composed of one or more anchor 
sentences is set to zero by the document ranking 
determining means. 

30 

to. A hypertext document retrieving apparatus accord- 
ing to daim 1 in which one hypertext document hav- 
ing contents corresponding to a plurality of 
meanings re^ectivaly identified by a reference 
label is divided into a plurality of blocks by the 3s 
hypertext document table preparing means to 
include one reference label in a top of each block, 
and one hypertext document information is pre- 
pared for each block of the hypertext dooiment by 
the hypertext document table preparing means. 40 

1 1. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a predetermined number of 
high-ranking particular hypertext documents are 
selected from among the particular hypertext doai- 4S 
ments by the document ranking d^ermining 
means, a plurality of related words appearing in the 
high-ranking particular hypertext documents are 
extracted from the high-ranking particular hypertext 
documents by the document ranking determining bo 
means, a plurality of importance degrees of the 
related words are calculated from a plurality of 
occurrence frequencies of the related words in the 
high-ranking particular hypertext docum^its by toe 
document ranking determining means, a predeter- ss 
mined number of high-ranking related words are 
selected from the related words ranked according 
to the importance degrees of toe related words by 
the document ranking determining means, and a 



plurality of selection buttons tor the high-ranking 
related words are displayed with the indexes of toe 
particular hypertext documents by the retrieval 
result displaying means. 

12. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a predetermined number of 
high-ranking particular hypertext documents are 
selected from among the particular hypertext docu- 
ments by toe document ranking determining 
means, a plurality of related words appearing in toe 
high-ranta'ng particular hypertext documents and a 
plurality of particular parent documents corre- 
qjonding to toe high-ranking particular hypertext 
documents are extracted from the high-ranking par- 
ticular hypertext documents by the document rank- 
ing determining means, a plurality of importance 
degrees of the related words are calculated from a 
plurality of occurrence frequencies of the related 
words in the high-ranking particular hypertext docu- 
ments and the particular parent documents by the 
document ranlcng determining means, a predeter- 
mined number of high-ranking related words are 
selected from the related words ranked according 
to the importance degrees of the related words by 
the document ranking determining means, and a 
plurality of selection buttons for the high-ranking 
related words are displayed with the indexes of the 
particular hypertext documents by the retrieval 
result displaying means: 

13. A hypertext documertf retrieving apparatus accord- 
ing to claim 1 in wNch a predetermined number of 
high-ranking particular hypertext documents are 
selected from among the particular hypertext docu- 
mertfs by the document ranking determining 
means, a plurality of related words appearing in toe 
hfgh-ranWng particular hypertext documents are 
extracted from the high-ranking particular hypertext 
doomients by the document ranking determining 
means, an occurrence frequency of each related 
word in the high-ranking particular hypertext docu- 
ments is calculated by the document ranking deter- 
mining means on condition that the related word 
appearing in one high-ranking particular hypertext 
document is double-counted in cases where an 
occurrence position of the related word is near to an 
occurrence position of the keyword, a plurality of 
importance degrees of the related words are calcu- 
lated from toe occurrence frequencies of the related 
words by the document ranking determining 
means, a predetermined number of high-ranking 
related words are selected from ihe related words 
ranked according to toe importance degrees of toe 
related wtsds by the document ranking determining 
means, and a plurality of sefection buttons for toe 
high-ranking related words are displayed with toe 
todexes of toe particular hypertext documents by 
toe retrieval result displaying means. 
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14 . A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a predetermined nurttier of 
high-ranking particular hypertext documents are 
selected from among the particular hypertext docu- 
ments by the document ranking determining s 
means, a plurality of related words appearing in the 
high-ranking particular hypertext documents ard a 
plurality of particular parent documents corre- 
sponding to the high-ranking particular hypertext 
documents are extracted from the high-ranking par- jo 
ticuiar hypertext documents by the document rar*- 

ing determining means, an occurrence frequency of 
each related word in the high-ranking particular 
hypertext documents and the particular parent doc- 
uments is calculated by the document ranking j5 
determining means on condition that the related 
word appearing in one high-ranking particular 
hypertext document or one particular parent docu- 
ment is double-counted in cases where an occur- 
rence position of the related word is near to an 20 
occurrence position of the keyword, a plurality of 
importance degrees of the related words are calcu- 
lated from the occurrence frequencies of the related 
words by tiie document ranking determining 
means, a predetermined number of high-ranking 25 
related words are selected from the related words 
ranked according to the importance degrees of the 
related words by the document ranking determining 
means, and a plurality of selection buttons for the 
high-ranking related words are displayed with tiie 30 
indexes of the particular hypertext documents by 
the retrieval result displaying means. 

15 . A hypertext document retrieving apparatus accwd- 

ing to claim 1 in which a plurality of keywords are ss 
received by the keyword receiving means, an 
occurrence frequency TF- of one keyword in one 
unified hypertext document is calculated by the 
document ranking determining means for each key- 
word and eacfr! unified hypertext document, an 40 
inverse document frequency IDF defined as an 
inverse value of the number of particular hyperted 
documents in which one keyword appears is cala^ 
lafed by the document ranking determining means 
for each keyword, a product TF'IDF of one occur- 4 S 
rence frequency TF and one inverse documwit fre- 
quency IDF is calculated by the document ranking 
determining means, a plurality of products for the 
keywords are summed by the document ranking 
determining means to produce a summed product so 
as an estimated value for each unified particular 
hypertext document, and the importance degrees 
of the unified hypertext documents are determined 
according to the estimated values by the document 
ranking determining means. ss 

16 . A hypertext document retrieving apparatus accord- 
ing to claim 1 5 in which one estimated value for one 
unified particular hypertext document is increased 



to heighten the rank of the particular hypertext doc- 
ument in cases where two or more keywords 
appear in the unified particular hypertext document 
or a distance of two keywords in the unified particu- 
lar hypertext document is within a predetermined 
number of words. 
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FIG. 7 
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FIG. 9 
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FIG. 12 
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FIG. 15 
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FIG. 19 
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FIG. 21 
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(54) Hypertext document retrieving apparatus for retrieving hypertext documents relating to 
each other 



(57) A hypertext document and anchor sentences of 
parent documents for the hypertext document are regis- 
tered with an hypertext document identifier as docu- 
ment information for each of hypertext documents 
having reference relationships with each other. A user 
can refer to one hypertext document according to an 
anchor sentence of another hypertext document func- 
tioning as a parent document. Also, occurrence posi* 
tions of one word in hypertext documents and parent 
documents are registered as word information for each 
of w/ords. When a keyword is input, a plurality of partic- 
ular hypertext documents and partiojiar parent docu- 
ments in which the keyword appears are ^ecified 
according to the word information, one particular hyper- 
text document and corresponding particular parent doc- 
uments are unified to a unified hypertext document for 
each particular hypertext document, an occurrence fre- 
quency of the keyword in each unified hypertext docu- 
ment is calculated according to the document 
information, importance degrees of the unified hypertext 
documents are calculated as those of the particular 
hypertext documents according to the occurrence fre- 
quencies, and ranking of the particuiar hypertext docu- 
ments are determined according to those importance 
degrees. Because the occurrence frequency is cabu- 
lated by considering the parent documents, hie particu- 



lar hypertext documents can be appropriately ranked. 
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